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ABSTRACT 



A study was conducted to compare, with simulated 
unidimensional and two-dimensional sets, the Type I error probabilities and 
rejection rates obtained with two versions of the LISREL computer program, 
the earlier version PRELIS/LISREL 7 and the later version PRELIS2/LISREL8 , a 
version that corrects the asymptotic covariance matrix. Unidimensional data 
sets were generated according to sample sizes of 2,500 and 5,000 and test 
lengths of 10 and 20 items. Two-dimensional item response vectors were 
generated for the same sample sizes and test lengths. Findings with 
unidimensional data sets suggest that the correction in PRELIS2 resulted in 
higher Type I error rates with the chi-square goodness-of - f it statistic. 
Rejection rates obtained using the LISREL8 chi-square fit statistic were high 
across all simulated two dimensional conditions. Results suggest that the 
LISREL7 chi-square fit statistic should be recommended to the researcher 
interested in determining whether the assumption of unidimensionality has 
been violated. (Contains 2 tables and 29 references.) ( SLD) 
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An Empirical Comparison of Two LISREL Chi-Square Goodness-of-Fit Statistics and the Implications for 

Dimensionality Assessment of Item Response Data 
Item response theory (IRT) models have been utilized extensively by researchers and practitioners 
alike over the past three decades to address a myriad of measurement-related issues. These issues include 
score equating (Lord, 1977; 1980) and, more recently, the development of computerized adaptive test forms 
(Hambleton, Zaal, & Pieters, 1993; Wainer, Dorans, Flaugher, Green, Mislevy, Steiberg, & Thissen, 1990). 

IRT models have been heavily relied upon in the assembly of nationally administered admissions (Stocking, 
1988) as well as licensure examinations (Luecht, De Champlain, & Nungester, 1996). 

The legitimate use of common IRT models, such as the family of logistic models implemented in 
popular software packages (e.g. BIGSTEPS; Linacre & Wright, 1993; BILOG; Mislevy & Bock, 1990), 
nonetheless requires that several strict assumptions be met. One of these assumptions is unidimensionality 
of the latent proficiency space. For example, the three-parameter logistic IRT model (Lord & Novick, 

1968) given by, 



Pfar 1 1 « A c /’0/) =c / + ( 1 -c) 



e D«Pj-b) 

l+e D°Pj-by 



(i) 



assumes that the probability of correctly answering item i ( denoted by Xj=l) is dependent upon item 
discrimination (a), difficulty (b) and lower asymptote (c) parameters as well as a latent trait or proficiency 
(0) postulated to underlie the item responses. Clearly, the assumption of unidimensionality is often 
compromised with actual achievement data sets where the response to an item is usually dependent on not 
only the hypothesized proficiency but also on several other ancillary abilities. This is illustrated quite 
clearly by the dependencies that frequently exist among reading comprehension items referring to a 
common stem (passage). In that instance, the dimensional structure of the item response set matrix is 
(potentially) augmented by the presence of content-related factors that are unrelated to the proficiency 
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underlying the item response matrix (e.g., reading proficiency). This has led to the elaboration of a 
multitude of descriptive and inferential statistics to assess dimensionality, or more commonly, departure 
from the assumption of unidimensionality. De Champlain & Gessaroli (in press ) have provided an outline 
of most of the procedures proposed thus far in this area along with their respective contributors. 

The use of indices and statistics based on nonlinear factor analysis (NLFA) to assess the 
dimensionality of item response data has proven to be popular due primarily to the efforts of Bartholomew 
(1983), McDonald (1967) and Takane and de Leeuw (1987). These researchers have shown that common 
IRT models and NLFA functions are mathematically equivalent. Chi-square goodness-of-fit statistics have 
been derived for use with both limited and full information factor analytic models as well as a variety of 
estimation procedures (Bock, Gibbons, & Muraki, 1988; Gessaroli & De Champlain, 1996; 1997; Gibbons 
& Hedeker, 1992; Jdreskog & Sdrbom, 1993a; 1993b; Muth£n, 1978). 

Among the factor analytic packages that are commercially available, PRELIS2/LISREL8 
(Jdreskog & Sdrbom, 1993a; 1993b) is particularly appealing in that it allows the user to fit confirmatory 
factor analytic models to a dichotomous item response matrix via several estimation procedures. For 
example, it is possible to determine the extent to which the assumption of unidimensionality has been 
violated with a given data set by simply fitting a one-factor model to a data set prior to calibrating the item 
responses using an IRT model. The parameters of factor analytic models in LISREL are estimated so as to 
minimize the following fit function: 

F=(s-o)' W-\s-o), ( 2 ) 
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where 

s = Sample item covariance matrix; 

o = Population covariance matrix; 

W' 1 = A weight matrix referred to as the correct weight matrix . 

With dichotomous item responses, s usually corresponds to sample estimates of the threshold and 
tetrachoric correlations; o contains the population threshold and tetrachoric correlation values and W' 1 is a 
consistent estimator of the asymptotic covariance matrix of s. Note that the latter weight matrix can be 
estimated solely with the LISREL7 and LISREL8 versions of the program (Joreskog & Sorbom, 1991a; 

1991b; 1993a; 1993b). 

A chi-square goodness-of-fit statistic, provided in LISREL7 and LISREL8 to aid in assessing 
model fit, is given by 

X 2 =(AM)* Min(F), (3) 

where N corresponds to the number of examinees in the sample and Min (T) is the minimum value of the fit 
function given in Equation (2) for a specific model. This statistic is distributed asymptotically as a chi- 
square distribution with degrees of freedom equal to 

•5(p)*(p + 1) - 1, 

where p is equal to the number of items and t is the number of independent parameters estimated in the 
model. 

Researchers have proposed several methods for estimating the asymptotic variances and 
covariances ofpolychoric correlations (Gunsjo, 1994; Muth6n, 1984). Christoffersson and Gunsjo (1996) 
suggest using a Taylor expansion for the equations that define the two-step estimator for 
tetrachoric/polychoric correlations. Joreskog (1994) suggests that a contingency table approach can be 
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utilized: thresholds can be estimated from the univariate marginals while polychoric correlations can be 
estimated from the bivariate marginals. Specifically, assuming that 

(4) 

is the vector of estimated threshold values and 

P = (p2PP3Pp32’P 4pP 42 5 P 43 vjP^-i) (5) 



is a vector containing all estimated polychoric correlation values, then the asymptotic matrix of p can be 
obtained as 



NACovt, p^p..)= 2 

a - 1 
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where 

NCo«pT*T>< (V 

In equations (6) and (7) 7i (ehlj) abcd are the probabilities of variables g, h, j andj in a four-way contingency 
table, the latter being consistently estimated by their corresponding sample proportions p (8h,j) abcd . Jdreskog 
(1994) also offers an alternative method of estimating these probabilities without the use of the four-way 
contingency table. Readers interested in obtaining more information regarding this approach and other 
issues should refer to Joreskog (1994) for more detail. 



A Comparison of Two LISREL Chi-Square Statistics 



6 

It is important to point out that the estimation procedure advocated by JOreskog (1994) has only 
been implemented in the most recent version of LISREL (i.e., LISREL8). In fact, JOreskog & Sorbom 
(1993a) clearly state that in earlier versions of the software (e.g., PRELIS/LISREL7; Joreskog & Sorbom, 
1991a; 1991b) the asymptotic covariance matrix is incorrect as it is based on the sometimes erroneous 
premise that two different polychoric correlations are asymptotically uncorrelated for given thresholds. The 
authors state that the correction implemented in the more recent version of LISREL enables the user to 
obtain a consistent estimate of the asymptotic covariance matrix of polychoric correlations without having 
to accept the simple assumption inherent in the earlier version of the program. This improvement should 
also, according to Jdreskog & Sdrbom (1993a), improve the chi-square goodness-of-fit statistic (i.e., yield 
better control over Type I error probabilities as well as increase power). In spite of this claim, it is 
important to point out that little empirical work has been undertaken to compare Type I error rates and 
rejection rates of the chi-square goodness-of-fit statistic prior to and after implementation of the correction 
brought to the estimation of the asymptotic covariance matrix. 

The purpose of the present investigation was to compare, with simulated unidimensional and two- 
dimensional data sets, Type I error probabilities and rejection rates obtained with the PRELIS/LISREL7 
(prior to correction) and PRELIS2/LISREL8 (after correction) chi-square goodness-of-fit statistics. 

Methods 

Unidimensional conditions 

In the first part of this investigation, the empirical Type I error rates of both statistics were 
computed under various conditions. Unidimensional item response vectors were simulated using a two- 
parameter logistic IRT function. The latter function is equivalent to the model outlined in Equation (1) 
with a zero lower asymptote parameter value. In addition, the unidimensional data sets were generated 
according to two sample sizes (2500 and 5000 examinees) as well as two test lengths (10 and 20 items). 
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The IRT item parameters used to simulate the item response vectors were selected from a nationally 
administered admissions examination. Note that the simulated 20 item data sets were composed of two 10 
item tests. The item parameters utilized to simulate responses to items 11-20 were therefore identical to 
those selected to generate responses to items 1-10. Proficiencies were randomly generated from a N(0,1) 
distribution. Each cell of this 2x2 design was replicated 100 times for a total of 400 unidimensional data 
sets. 



Two-dimensional conditions 

In the second part of this investigation, two-dimensional item response vectors were simulated 
using the following multidimensional two-parameter compensatory logistic IRT model (Reckase, 1985) 







1 +e 






(S) 



where 

aj = a vector of discrimination parameters for item i; 
d[ = a scalar parameter related to the difficulty of item i; 

0j = a latent trait vector. 

These two-dimensional item response vectors were generated according to the same two sample 
sizes (2500 and 5000 examinees) and two test lengths (10 and 20 items) outlined in the previous section of 
the proposal. These vectors were also generated according to the following dimension dominance and latent 
trait correlation conditions: 

Dimension dominance: 50% of the items on 0 } and 50% of the items on 0 2 . 

80% of the items requiring 0j and 20% of the items on 0 2 . 
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This corresponds to a weak two-dimensional structure. 

Latent trait correlation: 0.0 and 0.7. 

The parameters used to generate the two-dimensional item response vectors were identical to those 
selected for the unidimensional simulations. As was previously outlined, the parameters selected to 
simulate responses to items 1 1-20 were the same as those employed to generate responses to items 1-10. 

Finally, proficiencies were randomly generated from a N(0,1) distribution. Each cell of this 
2 x 2 x 2 x 2 design was replicated 100 times for a total of 1600 two-dimensional data sets. 

Analyses 

Initially, the asymptotic covariance matrix of the tetrachoric correlations was estimated for all data 
sets using PRELIS and PRELIS2. The parameters for a one- factor model were then estimated using 
weighted least-squares which enabled the computation of the chi-square goodness-of-fit statistics with both 
LISREL7 and LISREL8. A nominal Type I error rate of .05 was selected for all analyses. Regarding 
unidimensional data sets, a logit-linear analysis was undertaken to model the effects of test length, sample 
size, asymptotic covariance matrix estimation procedure (i.e. PRELIS vs PRELIS2) and the interaction of 
the latter variables with respect to decision accuracy , i.e., the number of times the assumption of 
unidimensionality was accepted and rejected (Type I error). For two-dimensional data sets, the effects of 
test length, sample size, dimension dominance, latent trait correlation, asymptotic covariance matrix 
estimation procedure and the various interaction terms of the latter factors with respect to decision accuracy 
were also estimated via a logit-linear analysis. Note that the logit-linear analyses were undertaken in a 
forward hierarchical fashion, that is, starting with the simplest main effect and progressing towards 
incrementally more complex models while heeding to the principle that higher-order effects are included in 
the model solely if the corresponding lower-order effects are also included. A model was deemed 
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acceptable if its corresponding p-value exceeded 0.15. Also, effects with z-values greater than 2.00 were 
treated as statistically significant. For the sake of consistency, significant associations in the logit-linear 
analyses will be discussed only in light of the independent variable(s). For example, should the decision 
accuracy by asymptotic covariance matrix estimation procedure association be statistically significant, it 
will be referred to as the effect of asymptotic covariance matrix estimation procedure. 

Results 

Unidimensional data sets 

The number of rejections of the assumption of unidimensionality for each simulated condition is 
shown in Table 1. 



Insert Table 1 about here 



Empirical Type I error rates ranged from 0.00 (for LISREL7 chi-square values based on data sets generated 
to contain 10 items and 2500 examinees as well as 20 items and 2500/5000 examinees) to .30 (for LISREL8 
chi-square values associated with data sets simulated to contain 20 items and 2500 examinees). The results 
from the logit-linear analysis indicate that a model containing the main effects of test length, sample size 
and asymptotic covariance matrix estimation procedure is sufficient to adequately account for the empirical 
Type I error rates, JL 2 (4) = 5.105, p=.277. Regarding the test length effect, the empirical Type I error rate 
computed for the 10 item data sets (0.05 or 21/400 incorrect rejections of unidimensionality) was 
significantly lower than that obtained with the 20 item data sets (0.12 or 49/400 incorrect rejections of 
unidimensionality). Similarly, with respect to the sample size effect, the empirical Type I error probability 
estimated for data sets simulated to contain 2500 examinees (0.1 1 or 43/400 incorrect rejections of the 
assumption of unidimensionality) was significantly greater than that associated with the 5000 examinee data 
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sets (0.07 or 27/400 incorrect rejections of the assumption of unidimensionality). Finally, and more 
importantly given the primary aim of this investigation, the asymptotic covariance matrix estimation 
procedure effect indicates that the empirical Type I error rate computed for the LISREL7 chi-square (0.005 
or 2/400 incorrect rejections of the assumption of unidimensionality) was significantly lower than that 
obtained with the LISREL8 chi-square statistic (0.17 or 68/400 incorrect rejections of the assumption of 
unidimensionality). 

Two-dimensional data sets 

The number of rejections of the assumption of unidimensionality for each simulated condition is 
shown in Table 2. 



Insert Table 2 about here 



As shown in Table 2, the assumption of unidimensionality was correctly rejected for all simulated 
data sets using the LISREL8 chi-square statistic. With the exception of two conditions, the assumption of 
unidimensionality was also correctly rejected for all simulated data sets using the LISREL7 chi-square 
statistic. More precisely, the use of the LISREL7 chi-square statistic incorrectly led to the acceptance of the 
null hypothesis of unidimensionality for 50/100 data sets simulated to contain 10 items, 2500 examinees, 
according to a weak two-dimensional structure and with a correlation of .70 between the two underlying 
proficiencies. Also, there were four incorrect acceptances of the assumption of unidimensionality with two 
dimensional data sets generated to contain 10 items, 5000 examinees, according to a weak two-dimensional 
structure and with a specified correlation of .70 between both underlying proficiencies. 

Due to the large number of empty cells in the design attributable to the zero Type II error rates 
across nearly all conditions, it was impossible to undertake the logit-linear analysis as anticipated. 

O 

ERIC 




A Comparison of Two LISREL Chi-Square Statistics 



11 

Nonetheless, it is quite clear from the findings presented in Table 2 that results differed noticeably in only 
one condition and solely for the LISREL7 chi-square statistic. 

Discussion 

The assessment of dimensionality is central to both classical and modem test theories. At the most 
basic level, the validity of a scored-based inference (what Messick, 1989 refers to as the structural aspect of 
construct validity) rests upon our knowledge of the underlying dimensional structure of an item response 
matrix. The need to better understand the structure of our data is therefore of the utmost importance. Past 
research has shown that NLFA models and accompanying fit statistics are very useful for assessing the 
dimensionality of an item response matrix given their relationship to common IRT functions. The 
PRELIS2/LISREL8 package in particular offers the practitioner a great deal of flexibility in assessing the fit 
of NLFA models to item response data. Nonetheless, significant changes have been implemented in the 
most recent version of the software which could have an impact on the behavior of the chi-square goodness- 
of-fit statistic and hence the decision made with respect to the underlying structure of a data set. 

The findings obtained in this study, with respect to unidimensional data sets, seem to suggest that 
the correction implemented in PRELIS2, regarding the estimation of the asymptotic covariance item 
response matrix, resulted in higher Type I error rates with the chi-square goodness-of-fit statistic. With the 
exception of one condition in which data sets were simulated to contain 10 items and 5000 examinees, 
empirical Type I error rates obtained with the LISREL8 chi-square statistic were well beyond two standard 
errors of the nominal alpha value (.05). It is important, however, to reiterate that the fit statistic provided in 
both LISREL packages is chi-square distributed asymptotically . It is possible that the empirical Type 1 
error rates estimated with the LISREL8 chi-square fit statistic would adhere more closely to the nominal 
alpha level with sample sizes exceeding those that were simulated in the present investigation. Nonetheless, 
it is also quite possible that these larger sample sizes would represent unrealistic testing situations. Based 



A Comparison of Two LISREL Chi-Square Statistics 



12 

on the logit-linear analysis results, the LISREL7 chi-square fit statistic seems to be preferable to the 
LISREL8 chi-square statistic given its greater control over Type I error probability (at least with data sets 
that resemble those that were simulated in this study). It should be pointed out that the latter fit statistic 
even appeared to be quite conservative in that it led to very few rejections of the assumption of 
unidimensionality. 

Not surprisingly given its inflated Type I error probabilities, rejection rates obtained using the 
LISREL8 chi-square fit statistic were high across all simulated two-dimensional conditions. With the 
exception of one condition in which item response matrices were generated to contain 10 items, 2500 
examinees, reflect a weak two-dimensional structure and a correlation of .70 between underlying 
proficiencies, high rejection rates were also noted with the LISREL7 chi-square fit statistic. Examining 
both empirical Type I error probabilities and rejection rates, it appears as though the LISREL7 chi-square 
fit statistic should be recommended to the researcher interested in determining whether the assumption of 
unidimensionality has been violated or not with a data set reflecting the conditions simulated in the current 
investigation. 

Having said this, it is important to state that these findings should be interpreted in light of several 
caveats. First, the reported findings are highly dependent upon the conditions that were simulated and 
generalizations to other configurations should be undertaken cautiously, if at all. For example, the two test 
lengths that were examined obviously did not reflect most testing situations. However, due to memory 
restrictions associated with PRELI S/LIS REL7 it was not possible to analyze data sets that contained more 
than 20 items. Second, it is important to re-emphasize that the purpose of this study was to examine the 
behavior of both chi-square statistics in several conditions that would hopefully allow us to gather practical 
information regarding both procedures. Obviously, numerous additional simulations should be undertaken 
before making any definitive statements about the Type I and Type II error rates of both statistics. 
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It is hoped nonetheless that the results obtained in this study will provide valuable preliminary 
information regarding the behavior of the LISREL7 and LISREL8 chi-square goodness-of-fit statistics. It is 
also hoped that these initial findings will foster future research thatwill bridge the areas of IRT and 
structural equation modeling with respect not only to goodness-of-fit but also to other issues of common 
interest that would benefit from a greater collaboration between both fields. 
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Table 1 

Number of rejections of the assumption of unidimensionalitv per 100 data sets: Unidimensional conditions 







10 items 




20 items 




N=2500 


N=5000 


N=2500 


N=5000 


LISREL7 


0 


2 


0 


0 


LISREL8 


13 


6 


30 


19 
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Table 2 

Number of rejections of the assumption of unidimensionalitv per 100 data sets: 



Two-dimensional conditions 



Test 

length 


Sample 

size 


Dimension 

dominance 


Proficiency 

correlation 


LISREL 

version 


Number of 
rejections 


10 


2500 


80%:20% 


0.00 


LISREL7 


100 


10 


2500 


80%:20% 


0.70 


LISREL7 


50 


10 


2500 


50%:50% 


0.00 


LISREL7 


100 


10 


2500 


50%:50% 


0.70 


LISREL7 


100 


10 


2500 


80%: 20% 


0.00 


LISREL8 


100 


10 


2500 


80%:20% 


0.70 


LISREL8 


100 


10 


2500 


50%: 50% 


0.00 


LISREL8 


100 


10 


2500 


50%: 50% 


0.70 


LISREL8 


100 


10 


5000 


80%: 20% 


0.00 


LISREL7 


100 


10 


5000 


80%: 20% 


0.70 


LISREL7 


96 


10 


5000 


50%: 50% 


0.00 


LISREL7 


100 


10 


5000 


50%:50% 


0.70 


LISREL7 


100 


10 


5000 


80%: 20% 


0.00 


LISREL8 


100 


10 


5000 


80%: 20% 


0.70 


LISREL8 


100 


10 
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